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Field of Invention 

The present invention relates to data processing systems and more particularly relates to 
a distributed OLAP-based association rule generation method and system. 

5 

Background of the Invention 

In the past several years, there has been a rise in the number of mass-merchandise 
retailers that have many geographically distributed stores that typically span across cities, states, 
and countries. With this increase in mass retailers, there has been an increase in the use by these 

10 retailers of networked computer systems for continuously collecting transaction data related to 
purchases. In this regard, merchants currently face the problem of data overload and are seeking 
for ways to convert this storehouse of data into intelligent information about its customers that 
can be used for business information. 

Since transaction data stems from daily purchases, returns, and exchanges that are 

15 inputted at all the cash registers of all the stores, millions of records can accumulate very 
quickly. There is an ever increasing need for the merchants to interpret the data and gamer 
information regarding the customers' buying habits, and use this information to improve its own 
business decisions. For example, one use of this vast storehouse of data to generate rules that 
describe past buying trends. Once these past buying trends are determined, business plans 

20 related to inventory and promotion can be devised accordingly. 

Ways to use this data to generate for business plaiming is also becoming increasingly 
important in the world of electronic commerce, where transactions are done on-line (e.g., 
through the Internet) and involve customers that are located worldwide. 

It is desirable to have rules that summarize the buying patterns of the customers. One 

25 such type of rule that reflects customers' buying habits is referred to as cross-sale association 
rules. Cross-sale association rules describe the relationship of the sales of one item to the sales 
of another item. These cross-sale association rules are quite beneficial to both merchants and 
customers. For the merchants, such a cross-sale association rule can help the merchant to plan 
which products to offer together to better meet historical demand. For customer, it is less likely 

30 that a desired product will be out-of-stock or that a separate trip to another store is needed to 
purchase a related product because the current merchant does not carry that particular product. 

There have been attempts to generate association rules that reflect customer behavior. 
Unfortunately, these systems are very limited in several respects. First, these systems narrowly 
define what is considered a transaction. For example, a single transaction is defined as those 
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products that are purchased and listed in a single receipt. Only those products that are listed on 
the same receipt are considered to be "associated," and only these products that are on the same 
receipt are tallied and counted for generation of association rules. 

As can be appreciated, there are many transactions that should contribute to the 

5 association rules, but instead are not captured by these prior art systems because of this narrow 
definition. Consider an association rule that is designed to answer the question, how many 
customers who bought a TV also bought a VCR. The prior art system would count only those 
customers who bought both a TV and a VCR in a single transaction that used a single receipt. If 
the same customer went to the same store one day after he purchased a TV to purchase a VCR, 

10 this customer would not be counted since the purchases did not occur in a single transaction. 
Even if a customer actually purchased the TV and VCR at the same time, but utilized two 
different credit cards or requested two separate receipts for some reason, those purchases would 
not contribute to the association rule because the purchases are on two separate receipts. 
Accordingly, it would be desirable for a mechanism that captures more transactions that 

15 contribute to associate rules, thereby generating association rules that more accurately reflect 
reality. 

It can be appreciated that the accuracy of such association rules depends on access to all 
the available information and the capture of all transactions relevant to that rule. Accordingly, it 
is desirable to generate these association rules based on as large a collection of transaction data 

20 that may be gathered at multiple distributed sites as possible. Unfortunately, in order to do so 
requires that hundreds of millions of transaction records be processed daily. Understandably, 
there are substantial challenges to current systems and approaches to process association rules. 

One challenge is to provide continuous rather than one-time value to e-commerce. Prior 
art data mining efforts are focused on analyzing historical data. In reality, however, data is 

25 being continuously collected, and it is preferable to have a mechanism that mines data 
continuously to dynamically detect trends and changes in real-time. For instance, prior art 
methods are limited to generating a cross-sale association rule that describes the relationship of 
the past sales of one item to the sales of another item. While such relationships are helpful in 
making planning and promotion decisions, the changes in cross-sale associations may be even 

30 more significant, since such changes usually reflect real-time trends, the reaction to a promotion, 
or the cause of sales drops or rises. Unfortunately, the prior art systems are unable to reflect 
such changes in cross-sale associations. 
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For example, suppose the sales of VCRs had been strongly associated with the sales of 
TVs, but this association has recently weakened as TV buyers turned to buying DVDs instead of 
VCRs. Such a change in the association helps explain or predict the slow down of VCR sales. 
As another example, the association of sales of PCs and a specific brand of printers becoming 

5 weaker may imply that customers turned to buying another brand of printers. Accordingly, it is 
desirable to have a mechanism to catch such dynamic association relationships. 

A second challenge is how to enable a conventional system, which is configured to 
process small amounts of data, to process very large data sets. In a conventional shopping 
network, a huge volume of transaction records must be processed everyday, and it is unlikely 

10 that centralized processing will yield satisfactory results. The scalability issue becomes more 
critical in the provision of real-time data mining service described above. In order to scale-up, a 
mechanism is needed to distribute data processing, reduce data volumes at each local site by 
summarization, and mine data incrementally at multiple levels of aggregation. Unfortunately, 
the prior art does not provide a way to perform these tasks on very large data sets. 

15 Accordingly, there remains a need for a system and method for generating association 

rules that more accurately reflects reality, that can provide more flexible and powerful 
information, and that overcomes the challenges and disadvantages set for previously. 
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SUMMARY OF THE INVENTION 
In one embodiment, an architecture that includes multiple Local Data-warehouse/OLAP 
Stations (LDOS), and a Global Data-warehouse/OLAP Station (GDOS) is provided. Each 
LDOS mines transaction data, summarizes the local transaction data and local customer 

5 behavior patterns. The GDOS mines the local transaction data provided by the LDOSs, 
summarizes the local transaction data and generates global customer behavior patterns. An 
OLAP-based global computation engine is provided at the GDOS for performing these tasks. 
The OLAP-based global computation engine also includes a scoped association rule generation 
module, an association rule having conjoint items generation module, and fimctional association 

10 rule generation module for generating association rules that can be expressed as 
multidimensional and multilevel cubes. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 is a block diagram of a distributed data warehouse/OLAP-based system according 
to one embodiment of the present invention. 

FIG. 2 illustrates in greater detail the LDOS of FIG. 1. 
5 FIG. 3 illustrates in greater detail the local computation engine of FIG. 2. 

FIG. 4 illustrates in greater detail the GDOS of FIG. 1. 
FIG. 5 illustrates in greater detail the global computation engine of FIG. 4. 
FIG. 6 illustrates the cubes related to cross-sale association rules with one slice of each 

cube. 

10 FIG. 7 illustrates an example of distributed association rule mining. 
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DETAILED DESCRIPTION OF THE PRESENT INVENTION 

JL An apparatus and method for a distributed and cooperative data warehousing, 
OLAP, and data mining system is disclosed. In the description that follows, various terms such 
as processors, servers, OLAP, data warehousing, mining, cubes, and association rules are used. 
5 These are the common terms used by those skilled in the art of data warehousing and OLAP to 
convey their ideas to each other. Although examples are given where commands from a 
particular language are utilized, it is noted that the implementation of the present invention is not 
limited to a particular programming language, as those skilled in the art can readily determine 
the appropriate language that is best suited for their specific application. 

10 

2. System Architecture 

FIG. 1 is a block diagram illustrating a data processing system configured in accordance 
with one embodiment of the present invention. The distributed and cooperative system of the 
present invention may be implemented with a minimum of two layers of data warehouse/OLAP 

15 stations: LDOSs and a GDOS. It is of course readily feasible for those skilled in the art to 
structure the system with multiple levels of data warehouses based on the teaching of the present 
invention. The LDOSs are responsible for local data mining and summarization, while the 
GDOS is responsible for merging and mining the input data from LDOSs, and for providing the 
mining results to LDOSs for business applications such as personalized promotions, inventory 

20 management, etc. 

LDOS 110. 130 

A local data warehouse 1 14, 134, and one or more OLAP servers 112, 132, are 
maintained in each LDOS 1 10, 130. The basic input data to an LDOS are transactions 102 that 

25 are fed in daily and dumped to archive after use. The basic output data sent to the GDOS 150 
periodically from an LDOS 110, 130, is a Profile Snapshot Cube 106 (PSC), which contains 
partial information for customer profiling. Transaction data and related reference data are stored 
in the warehouse. Each OLAP server 1 12, 132, includes a local transaction computation engine, 
which is described in greater detail with reference to FIGS. 2 and 3, for building and 

30 incrementally updating PSC, by mining new transactions flowing into the local data warehouse 
1 14, 134, and for deriving patterns for local analysis. Loading transaction data 102 into the data 
warehouse 1 14, 134, and then loading the warehoused data to the OLAP server 1 12, 132, for 
generating PSCs 104 can be a periodical process (e.g. hourly or daily). 



Attorney Docket No. 10991147-1 

-8- 

Besides feeding PSCs 104 to the GDOS 150 periodically, an LDOS 110, 130, also 
receives data mining results, such as global association rules 159 or alerts, as feedback 160 from 
the GDOS 150. The LDOS 1 10, 130 can access the GDOS 150 to get association rules and 
customer profiles when necessary. 

5 FIG. 2 is a block diagram illustrating the LDOS 1 1 0 of FIG. 1 . The LDOS 110 includes a 

data warehouse 1 14, an OLAP server 112 that has a local computation engine 330, and a multi- 
dimensional database 340. The present invention powerfully extends the function of a traditional 
OLAP server 112 by providing the local computation engine 330. In the preferred embodiment, 
the local computation engine 330 of the present invention is an OLAP -based profile engine. As 
10 will be described in greater detail hereinafter with reference to FIG. 3, the OLAP-based local 
computation engine 330 of the present invention provides a scalable engine for delivering 
powerful solutions for customer behavior profiling, pattern generation, analysis and comparison, 
and data management. 

First, the local computation engine (LCE) engine 330 builds and incrementally updates 
15 customer buying behavior profiles by mining transaction records 102 flowing into the data- 
warehouse 114 on a periodic basis. Second, the LCE 330 maintains profiles by staging data 
between the data-warehouse 1 14 and an OLAP multidimensional database 340. For example, a 
profile cube 108, a profile snapshot cube 106, and an updated profile cube 104 (which are part of 
the OLAP multidimensional database 340) can be generated based on data received from the 
20 transaction table 290 and the profile table 280. The profile cube 108, the profile snapshot cube 
106, and the updated profile cube 104 are described in greater detail hereinafter. Third, the LCE 
330 derives multilevel and multidimensional customer buying patterns from the updated profile 
cube 104. 

The data warehouse 114 includes a profile table 280 for the storing customer profile 
25 information and a transaction table 290 for storing sales transactions 102. For example, 
transactions 102 may be loaded into the transaction table 290 on a periodic basis (e.g., on a daily 
basis). 

In one embodiment, the data warehouse 114 can be implemented with an Oracle-8 based 
telecommunication data-warehouse, and the OLAP server 1 12 and multi-dimensional database 
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340 can be implemented with an Oracle Express multidimensional OLAP server. The local 
computation engine 330 is preferably implemented by OLAP programming (i.e., by a program 
written in a scripting language provided by the OLAP server 112). 

FIG. 3 is block diagram illustrating in greater detail the OLAP server 1 12 of FIG. 2. The 

5 OLAP server 112 includes traditional OLAP analysis and visualization tools 320 that are 
typically used for query and analysis of corporate data, such as sales, marketing, financial, 
manufacturing, or human resources data. The OLAP server 112 also includes the local 
computation engine (LCE) 330, which in accordance to one embodiment of the present 
invention, is an OLAP-based scalable computation engine for creating profiles, updating 

10 profiles, deriving shopping behavior patterns from profiles, and analyzing and comparing these 
patterns. 

The LCE 330 includes a local profile builder and update module (LPBUM) 340, a local 
behavior pattern generation module (LBPGM) 350, and a feedback and association rule 
utilization module (FARUM) 360. The local profile builder and update module (LPBUM) 340 

15 builds and updates customer profiles by incrementally mining the transaction data that flows 
periodically into the data-warehouse 114. Mining refers generally to the well-known process of 
converting data in a first format (e.g., a record format suited for a relational database) into a 
second format (e.g., a multi-dimensional cube format suited for a multi-dimensional database). 

The local behavior pattern generation module (LBPGM) 350 derives customer behavior 

20 patterns (e.g., calling pattern cubes) from the customer profiles. The feedback and association 
rule utilization module (FARUM) 360 receives the association rules and other feedback 
generated by the GDOS 150 and utilizes the feedback information for business planning, such as 
inventory management and sales promotion. 

25 GDOS 150 

A global data warehouse 154 and one or more OLAP servers 152 are maintained in the 
GDOS 150. The GDOS 150 has bi-directional communications with the LDOS 110, 130. It 
combines the summary information from multiple LDOSs to build and incrementally update 
global customer profiles, association rules, etc, which cannot be completed at a single LDOS, 
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and feed back 160 the resulting profiles, rules and other derived objects such as alternative 
promotion plans, to the LDOSs. Multilevel and multidimensional customer shopping patterns 
may be extracted, analyzed and compared. In a present implementation, the volume cubes 157 
for generating association rules 159 are extracted from cubes representing customer profiles 156, 
5 which will be described in more detail. 

To achieve the object, the GDOS and the LDOSs operate in a cooperative manner. The 
GDOS relies on the LDOSs to reduce data load and computation load for enhanced scalability; 
and the customer profiles and rules are generated at the GDOS by using the summary 
information fed by the LDOSs in combination. As can be appreciated by those skilled in the art, 
10 more stations, or levels thereof, may be introduced based on the particularity of the individual 
systems. 

FIG. 4 is a block diagram illustrating the GDOS 150 of FIG. 1. The GDOS 150 includes 
a data warehouse 154, an OLAP server 152 that has a global computation engine (GCE) 530, 
and a multi-dimensional database 440. The present invention powerfully extends the function of 

15 a traditional OLAP server 152 by providing the global computation engine 530. In the preferred 
embodiment, the global computation engine 530 of the present invention is an OLAP-based 
engine. As will be described in greater detail hereinafter with reference to FIG. 5, the OLAP- 
based global computation engine 530 of the present invention provides a scalable engine for 
delivering powerful solutions for customer behavior profiling, pattern generation, and 

20 association rule generation. 

First, the global computation engine (GCE) engine 530 builds and incrementally updates 
customer buying behavior profiles by mining local transaction records 104 flowing into the data- 
warehouse 154 from the LDOS (e.g., LDOS 110 and 130). Second, the GCE 530 maintains 
profiles by staging data between the data-warehouse 154 and an OLAP multidimensional 

25 database 440. For example, a global updated profile cube 156, one or more global behavior 
pattern cubes 158, and one more association rule cubes 159 (which are part of the OLAP 
multidimensional database 440) can be generated based on data received from the global 
transaction table 420 and the global profile table 410. These cubes are described in greater detail 
hereinafter. Third, the GCE 530 derives multilevel and multidimensional customer buying 
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pattems 158 from the updated profile cube 156. Fourth, the GCE 530 generates one or more 
association rule cubes 159 based on the behavior patterns 158. 

The data warehouse 154 includes a global profile table 410 for the storing customer 
profile information and a global transaction table 420 for storing sales transactions 104 received 
5 from the LDOSs. For example, transactions 104 may be loaded into the global transaction table 
420 on a periodic basis (e.g., on a daily basis). 

In one embodiment, the data warehouse 154 can be implemented with an Oracle-8 based 
telecommunication data- warehouse, and the OLAP server 152 and multi-dimensional database 
440 can be implemented with an Oracle Express multidimensional OLAP server. The global 
10 computation engine 530 is preferably implemented by OLAP programming (i.e., by a program 
written in a scripting language provided by the OLAP server 152). 

FIG. 5 is block diagram illusfrating in greater detail the OLAP server 152 of FIG. 4. The 
OLAP server 152 includes traditional OLAP analysis and visualization tools 520 that are 
typically used for query and analysis of corporate data, such as sales, marketing, financial, 
15 manufacturing, or human resources data. The OLAP server 152 also includes the global 
computation engine (GCE) 530, which in accordance to one embodiment of the present 
invention, is an OLAP-based scalable computation engine for creating profiles, updating 
profiles, deriving shopping behavior patterns from profiles, and generating association rules 
based on these patterns. 

20 The GCE 530 includes a global profile builder and update module (GPBUM) 540, a 

global behavior pattern generation module (GBPGM) 550, a scoped association rule generation 
module 560, an association rule with conjoint items generation module 570, and a fimctional 
association rule generation module 580. The GPBUM 540 builds and updates customer profiles 
by incrementally mining the transaction data 104 that flows periodically into the data- warehouse 

25 154. The GBPGM 550 derives customer behavior patterns (e.g., shopping pattern cubes) from 
the customer profiles. 

The scoped association rule generation module 560 generates association rules that have 
different underlying bases (i.e., different populations over which the rule is defined). Two 
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examples of these scoped association rules are a cross-sale association rule that is based on 
transactions and a cross-sale association rule that is based on customers. The association rule 
having conjoint items generation module 570 generates association rules that includes a conjoint 
dimension (e.g., a cube where a cell in the association can cross two values). The functional 
association rule generation module 580 generates association rules having predicates, 
consequent, and antecedent where the rule's predicates include variables, and the variables in the 
consequent are functions of those in the antecedent. 

The association generation modules (560, 570 and 580 generate different classes of 
association rules, which are described in greater detail hereinafter, based on the customer 
behavior patterns. These association rules that have a global perspective, which is not possible 
at the local level, are provided to the LDOSs and specifically provided to the feedback and 
association rule utilization module (FARUM) 360 of the LDOS for use in business planning. 

One aspect of the present invention is directed to generating customer profiles to reflect 
customers' shopping behavior patterns, which in turn can be used for personalized marketing and 
commercial promotions. Also, the present invention continuously and incrementally mines 
association rules that can be used to identify opportunities for cross-selling, explain causes of 
sudden sale increases or drops, or analyze shopping trends. 

3. Customer Profiles and Shopping Patterns 

To take advantage of the aforementioned data warehouse system of GDOS and LDOSs, 
customer profiles and shopping patterns are represented as cubes. A cube C has a set of 
underlying dimensions Di, D„, and is used to represent a multidimensional measure. Each 
cell of the cube is identified by one value from each of tiie dimensions, and contains a value of 
the measure. The measure can be said to be dimensioned by Dj, D„. The set of values of a 
dimension D, called tiie domain of D, may be limited (by the OLAP limit operation) to a subset. 
A sub-cube (slice or dice) can be derived from a cube C by dimensioning C by a subset of its 
dimensions, and/or by limiting the value sets of these dimensions. 
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^■1 Profile Cubes ("PF") 

A profile cube PF contains profiling information of multiple customers, and has 
dimensions kind, product, customer, merchant, time, and area. It is derived from the transaction 
data stored in the relational data warehouse. For example, the profile cube, PF, can be defined 
in the Oracle Express® Language as: 

define PF variable decimal <kind, sparse <product, customer, merchant, time, area» 

where dimension )tm<i has values SalelnUmts', 'salelnDollars'. 'discounts', 'couponDiscounts', 

•loyaltyDiscounts: 'paymentMethods', etc. Introducing this dimension facilitates the representation of 
multiple measures by the single cube. Decimal data type is preferably used to cover all 
numerical data types. A specific integer measure may be derived from a profile cube, and the 
integer data type converted from the decimal data type. 

Note that the use of keyword "sparse" in the above definitions instructs Oracle Express 
to create a composite dimension <product, customer, merchant, time, area>, in order to handle sparseness 
in an efficient way. A composite dimension is a list of dimension-value combinations. A 
combination is an index mto one or more sparse data cubes. The use of a composite dimension 
allows Oracle Express to store sparse data in a compact form similar to relation tuples. 

Profile cubes are currently maintained at both the GDOS and the LDOSs. At a LDOS, a 
local profile cube is populated by means of binning. A transaction data record contains fields 
with values mapping to each dimension of the cube. Such mapping is referred to as binning. 
For example, '01Jan98 8:44am' is mapped to time-bin '01 Jan98'. A transaction made at 
'01Jan98 8:44am' and at the 'Safeway Market' in S.F. falls into the cell corresponding to time = 
'01Jan98' and area = 'S.F.'. 

At the GDOS, a centralized profile cube with desired coverage in time, area etc, is retrieved 
from the database and updated by merging the appropriate local profile cubes, and then may be 
stored back to database, which can be accomplished using Oracle Express. Let PF be the 
centralized cube and PFj, PF^ the local ones. Their merge is simply expressed as: pf^pf^ 
pp^ + ... + PF, . In this way, customer profiles are combined and updated incrementally as each 
new local cube flows into the GDOS. 

Hierarchical Dimensions for Mult ilevel Pattern Representation 
Shopping pattern cubes are derived from profile cubes and used to represent the 
shopping behavior of individual customers. In order to represent such shopping behavior at 
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multiple levels, each of the dimensions is defined to be a hierarchical dimension, along which 
the shopping pattern cubes can roUup. 

A hierarchical dimension D contains values at different levels of abstraction. Associated 
with D there are a dimension DL describing the levels of Z), a relation DL D mapping each 
5 value of D to the appropriate level, and a relation D_D mapping each value of D to its parent 
value (the value at the immediate upper level). Let D be an vinderlying dimension of a numerical 
cube C. D, together with DL, DL D and D_D, fully specify a dimension hierarchy. They 
provide sufficient information to rollup cube C along dimension D, that is, to calculate the total 
of cube data at the upper levels using the corresponding lower- level data. A cube may be rolled 
10 up along multiple underlying dimensions. In the applications presently implemented, the 
following hierarchies are introduced: 

The PRODUCT HIERARCHY is made up of the following objects: 

■ product: dimension with values at the product level (e.g. 'HP InkjetSOO'), product_category 
15 level (e.g. 'printer'), etc, and a special value 'top' at top-level. 

■ prodLevel: dimension with values 'prod_item', 'prod_category', 'prod_kind', 'top'. 

■ prod_prod: parent relation (product, product) mapping each value to its parent, e.g. 

prod _prod(product 'HP InkjetSOO') = 'printer' 
20 prod j>rod(product 'top') = NA 

■ prodLevel_prod: level relation (product, prodLevel) mapping each value to its level, e.g. 

prodLevel _prod(product 'HP InkjetSOO') = 'prodjtem' 
prodLevel _prod(product 'printer') = 'computer peripheral' 

25 

Analogously, the MERCHANT HIERARCHY is made up of dimension merchant; 
dimension mercLevel with values 'store', 'store_category'and 'top'; parent relation mercjnerc 
and level relation mercLevel jnerc. The CUSTOMER HIERARCHY is made up of dimension 
customer, dimension custLevel with values 'shopper', 'shoppergroup', 'shoppercate gory 'and 
30 'top'; parent relation cust_cust and level relation custLevel_cust. The TIME HIERARCHY is 
made up of dimension time; dimension timeLevel with values 'day', 'month', 'year'and 'top'; 
parent relation timejime and level relation timeLevel Jime. The AREA HIERARCHY is made 
up of dimension area; dimension areaLevel with values 'city', 'state', 'region' and 'top'; parent 
relation areajzrea and level relation areaLevel _area. 
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For storing, combining and updating profile cubes, the bottom level of each dimension is 
preferably used. For analysis, shopping pattern cubes are used, and they are allowed to rollup 
along any hierarchical dimensions. 

3.3 Deriving Shopping Pattern Cubes from Profile Cubes 

Various shopping patterns can be derived from profile cubes and handled as shopping 
pattern cubes. They may be used to represent the shopping behavior of a collection of customers 
or a single customer; they may be based on volumes or probability distributions; and they may 
be materialized (defined as variables) or not (defined as formulas). 

Multiple or single customer based patterns 

For example, a cube representing a single measure, SaleUnits may be defined as a 
formula (view^) of the above profile cube PFby the following. 

define SaleUnits formula int <product, customer, merchant, time, area> 
EQPF(kind 'salelnUnits') 

A cube representing the same measure for a single customer, say, 'Doe', is defined as 
define SaleUnits. 1 formula int <product, merchant, time, area> 
EQPF(kind 'salelnUnits', customer 'Doe') 

Volume or probability distribution based patterns 

In a volume-based shopping pattern cube, each cell value is a quantitative measure. The 
cube may be rolled up along its hierarchical dimensions. For example, cube SaleUnits defined 
above can be rolled up along dimension product using relation prod_prod, and analogously, 
along dimensions customer, merchant, time, area. The cell values of this cube are the number 
of purchased products falling to the given 'slot' of time, area, etc. For example: 

SaleUnits (product 'pen', customer 'John Doe', merchant 'Sears', time '01Jan98', area 'San 

Francisco ') 

gives the number of pens purchased by John Doe at Sears in San Francisco on 01Jan98; 

SaleUnits. Doe (product 'pen', customer 'John Doe', merchant 'top', time 'top', area 'top') 
gives the number of pens purchased by John Doe anywhere and anytime covered by the 
profiling period and area. 

Cubes representing probability distribution based shopping patterns are derived from 
volume-based pattern cubes. They provide more fine-grained representation of dynamic 
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behavior than fixed value based ones. They also allow shopping patterns corresponding to 
different lengths of profiling interval to be compared. 

Cubes representing different probability measures can be derived from the profile cube. 
For example, the follov^^ing cube, PLD, represents the probability distributions of loyalty 
discounts over all discounts along multiple dimensions: 

define PLD variable decimal <customer, product, merchant, time, area> 
It can be calculated through the following steps. 

• Define cubes LoyaltyDiscount and Discount over dimensions customer, product, merchant, time, 
area 

• Populate LoyaltyDiscount and Discount as sub-cubes of PF with dimension kind limited to 
"LoyaltyDiscount" and "Discounf respectively 

• Rollup LoyaltyDiscount and Discount along dimension product, merchant, time and area 

• Then PLD = LoyaltyDiscount / Discount (please note this is a cell-wise operation) 

Probability distributions measured from different standpoints may also be derived. 

• a merchant, such as Sears chain store, can discover the overall feedback to a promotion by 
summarizing information from multiple distributed locations; 

• the shopping behavior of individual customers, e.g. the purchase volume, the loyalty to 
discount offer, can be compared with the overall shopping behavior for guiding personalized 
promotion; 

• customers can get shopping guidelines by examine the probability distributions of products 
sold in muhiple areas. 

For efficiency as well as consistency, it is preferable to store profile cubes persistently in 
the data-warehouse. Shopping patterns, either based on volume or probability, can be derived 
on the fly (at analysis time) using the OLAP engine for computation. This shows the simplicity, 
and yet the power, of using OLAP to handle customer profiling. 

4. Extended Association Rules 

One of the applications of the distributed data-warehouse/OLAP system of the present 
invention is to enhance association rule mining. In e-commerce, association rules benefit both 
merchants and customers. However, association rules are typically created and incrementally 
updated from hundreds of millions of transaction records generated daily. 
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The distributed data warehousing and OLAP system of the present invention can 
distribute data processing, reduce data volumes at each local site by summarization, and mine 
data incrementally at multiple levels. In addition to scaling up rule mining, this system can 
combine information from different locations for generating association rules with enhanced 
expressive power. As will be described in the following, multilevel and multidimensional 
association rules can be extended by introducing scoped association rules, association rules with 
conjoint items stud functional association rules, and the rules can be computed using data cubes 
and OLAP. 

4. 1 Scoped Multidimensional and Multilevel Association Rules 

As can be appreciated by those skilled in the art, association rules provide a quantitative 
measurement of the association relationships between facts. Association rule mining aims at 
inferring such relationships from fransaction summary data. For example, a cross-sale 

association rule is used to answer "how many customers who bought product A, also bought product B in one 

month?" An association rule can be simply expressed by 1^ where X is its antecedent, and Y 
is its consequent and they are conjunctive predicates. Related to each association rule there is a 
CONFIDENCE and a SUPPORT. 

In the above example, if 80% of the customers who bought A also bought B, and only 
10% of all the customers bought both, it can be said that the association rule has confidence 80% 
and support 10%. Given apphcation-specific minimum support and confidence thresholds, a 
rule is considered strong if it satisfies these thresholds. 



Base . 

An association rule has an underiying base B that contains a population over which 
rule is defined. The scoped association rule generation module _ can generate association n 
that have different bases. For example, the scoped association rule generation module _ 
generate a cross-sale association rule that is based on transactions, as: 

xeTransactions: contain j)roduct(x. A) =^ contain jroductfx, B), 
or based on customers, as: 

^GCustomers: buy _product(x. A) =^buyj)roduct(x, B), 
regardless of whether they made such purchase in the same transaction or not. 
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In this example, the association rule uses binary predicates with the first place denoting a 
base element and the second place denoting an item. In these examples, the items are elements 
of a set of products. 

The ability to generate association rules with different bases or the present invention is 
important to enable cooperative rule mining between GDOS and LDOSs. For example, if the 
customers of interest shop at several locations covered by different LDOSs, then customer based 
cross-sale association rules should be mined at GDOS on the summary data fed from multiple 
LDOSs. If the customers of interest are partitioned geographically and covered by individual 
LDOSs, such rules can be first mined locally at each LDOS and then assembled in the GDOS. 

Scoped Association Rule. 

As noted previously, scoped association rule generation module 560 generates 
association rules with different bases, which are referred to as scoped association rules. For a 
rule 

B:X^Y, 

the set of base elements in B that match X is denoted by P^, and its cardinality is denoted by j Px\. 
The confidence of rule X ^ 7 in B, denoted by ^b(X=>Y), can be calculated by 
^b(X=>Y) = \Pxr\ Py\ / \Px\, rangmg from 0 to 1 . The support of rule X ^ F in B, denoted by 
eB(X^Y), can be calculated by eB(X=>Y) = {Pxr^Prl / 1^1 • For simplicity, when the base B is 
understood from the context, the suffix B from ,^and 6'is dropped. 

Multidimensional Association Rule . 

The scoped association rule generation module _ also provides multidimensional 
association rules, e.g. 

[xeCustomers: buy_product(x, 'A') =>buy_prodtiCt(x, 'B')j\ merchant = 'Sears', 

area = 'Los Angeles ', time = 'Jan98 ' 
In this example, customer is the base, products are the items, and merchant, area and time are 
underlying /eamre5 of the rale. Essentially, the base of multidimensional rales is dimensioned 
by i!a& features. 

Multilevel Association Rule. 

Further, the scoped association rale generation module _ can provide multilevel 
association rales. For example, an area may be represented at city level (e.g. San Francisco) or 
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at state level (e.g. California); a time may be represented at day, month or year levels. 
Accordingly, the scoped association rule generation module _ can generate and specify 
association rules at different area levels and time levels, e.g. 

[xeCustomers: buy_product(x, 'A') ^buyjproduct(x, 'B')j\ merchant = 'Sears', 

area = 'Los Angeles ', time = 'Jan98 ' 

[xeCustomers: buy_product(x, 'A') =>buy_product(x, 'B')j\ merchant = 'Sears', 
area = 'California ', time = 'Jan98 ' 

[xeCustomers: biry_product(x, 'A') =^buy_product(x, 'B') ] \ merchant = 'Sears', 

area = 'California ', time = 'Year98 ' 
For multidimensional association rules, the cardinalities, confidence and support are 
dimensioned by, or as functions of, the features. 



4.2 Cube-based Associ ation Rule Mining 

Association rules are represented by cubes. Volume cubes, where cell values are counts, 
are used for deriving and measuring multidimensional IPxr^Pyl \Px\, and \B\ in the 
intermediate steps of computing multidimensional association rules. 



Volume cube. 

A volume cube contains multidimensional counts for deriving association rules. For 
example, cross-sale association rules are derived from volume cubes such as 

SaleVnits (customer, product, merchant, time, area) . 

Each cell contains the number of purchased units dimensioned by customer, product, merchant, 
time and area. This cube is derived from a profile cube and materialized for mining cross-sale 
association rules. 

Volume cubes such as SaleUnits are maintained at both GDOS and LDOSs. At a LDOS, 
a local cube is populated from transaction data. At the GDOS, a centralized cube with desired 
coverage in time, area, etc., is retrieved from the database and updated by merging local cubes. 

The dimensions of a volume cube, as well as other association rule related cubes, can be 
classified into the following categories for the purpose of association rule mining: 

- Item dimensions on which volume data for quantifying the association relationship 
are counted, such as the product dimension in the above example. 
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- Base dimensions on which rules are quantified such as the customer dimension. A 
volume cube may rollup along its hierarchical dimensions, but not along a base dimension. 
For example, aggregating the number of purchased products along the customer dimension 
to a high-level customer value, say, 'engineer', may not be meaningful to deriving cross-sale 
associations, e.g. if one engineer buys milk and another buys eggs, this does not imply any 
meaningful cross-sale association. 

- Feature dimensions on which the generated rules may be dimensioned such as 
merchant, time and area. 

A volume cube Cy can be used for deriving the instances of rule X ^ 7 if it has a base 
dimension that represent the base of the rule, and the association conditions for qualifying XaY, 
are definable on Cy. For deriving cross-sale association rules from cube SaleUnits, an 
association condition can be: 

for each base and feature dimension, C/productA) >0a C^(product B) > 0 

If association conditions used to compute multidimensional \Px nPy| are definable on 
Cy, then another kind of condition, called antecedent conditions that are used to compute 
multidimensional \Px\, are also definable on Cv, such as: 

for each base and feature dimension, C/product A) > 0 



Association cube. 

The association cube C« for rule gives volume-based measure of multidimensional 
association relationships that are computed from the volume cube Cy, and is used to derive the 
confidence cube and the support cube of association rules. More particularly, it maintains 
dimensioned {PxnPyl i-^- the number of base elements that satisfy XaZ 

Usually Ca is dimensioned differently from Cy. In the cross-sale association rule 
example, the association cube is defined as: 

CrossSales (product, product!, customer _group, merchant, time, area) 
A cell of this cube, CrossSales (product product! 'B', customer _group 'engineer', merchant 'Sears', time 'Jan98', 

area 'Los Angeles') = 4500 means that there are 4,500 customers who are engineers, who bought item 
A as well as item B, at a Sears store in Los Angeles in Jan98. 

The item dimensions, base dimension ?tn.d feature dimensions of an association cube can 
be explained below. 

- Item dimensions underlying the counts for deriving association rules, such as 
dimQUsions product and product! for the above CrossSales cube, product! has the same set 
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of values as product, and it is called the mirror dimension of product. A mirror dimension 
can be introduced simply because the cross-sale association rule involves more than one 
element of the item dimension. 

- Base dimension underlying the base of rules such as the customer dimension. 
5 Unlike the volume cube, the association cube does not necessarily have to be dimensioned 

by the base dimension. However, rules can be dimensioned by the dimension with each 
value identifying a group of base dimension values at bottom levels. In cube CrossSales 
shown above, the hierarchical dimension customer _group is introduced, which has levels 
'customer__profession', 'customer_category'and 'top'. A relation is also defined for relating 
10 customers and customer groups. For example, a value of the customer^roup dimension, 

say, "engineer", is used to identify a group of individual customers who are engineers. 

Feature dimensions such as merchant, time and area, by which rules are 
dimensioned. 

15 Population cube and Base cube. 

The population cube Cp and the base cube Cb for rule X^Y are also derived from the 
volimie cube Cv. Cp is used to measure dimensioned IPrl, i-c. the numbers of base elements 
satisfying X. Cb is used to represent dimensioned \B\. For the above cross-sale rules, the 
population cube is defined as: 

- 20 NumOJBuyers (product, customer _group, merchant, time, area) 

A cell of this cube, NumOfBuyers (product 'A', customer _group 'engineer', merchant 'Sears*, time 'Jan98', area 'Los 

Angeles') = 10000 means that there are 10,000 customers who are engineers, bought item A in Los 
Angeles in Jan98. The base cube is defined as: 

NumOfShoppers (customer _group, merchant, time, area) 
25 Note that NumOfShoppers is not aggregated from NumOfBuyers, as a single customer may buy 
multiple products. 

Confidence cube and Support cube. 

The confidence of rule Y, defined as \Pxr^PY\ / \Px\, and the support, defined as 
30 \Px<^ Py\ / are represented as cubes C/and G, Cj is derived from Q and Cp, and Cs is 

derived fi-om Ca and Cb. They have the same dimensions as Ca. For the above cross-sale rules, 
the confidence cube and support cube are defined as: 

Confidence (product, product!, customer _^oup, merchant, time, area) 
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Support (product, product!, customer _group, merchant, time, area) 
FIG. 6 shows the cubes related to cross-sale association rules, with one slice of each 
cube. The volume-cube 210 is generated from transactions; the association-cube 230, base-cube 
and population-cube are derived from the volume cube; the confidence-cube 260 is derived from 
5 the association cube 230 and population cube 240; and the support-cube 250 is derived from the 
association-cube 230 and base-cube 220. The slices of these cubes shown in the figure 
correspond to the same list of values in dimension merchant, time, area and customer _group. 

Multidimensional and multilevel association rules. 
10 Representing association rules by cubes and underlying cubes by hierarchical dimensions 

naturally supports multidimensional and multilevel rules. Also, these rules are well organized 
and can be easily queried. 

First, cells of an association cube with different dimension values are related to 
association rule instances in different scopes. In the association cube CrossSales, cell 
15 CrossSales (product 'A', product! 'B', customer _group 'engineer', merchant 'Sears', area 'Los 

Angeles ', 

time 'Jan98') 
represents the following multidimensional rule: 

[xeCustomers: buy_j>roduct(x, 'A') =>buy_product(x, 'B') ] \ customer _group = 'engineer', 
20 merchant = 'Sears ', area = 'Los Angeles ', time = 'Jan98 ' . 

If this cell has value 4500, and the corresponding cell in the population cube has value 10000, 
then this rule has confidence 0.45. 

Next, as the cubes representing rules have hierarchical dimensions, they represent not 
only multi-dimensional but also multi-level association rules. For example, the following cells 
25 CrossSales (product 'A ', product! 'B ', customer _^oup 'engineer ', merchant 'Sears ', area 

'California ', 

time 'Jan98') and 

CrossSales(product 'A', product! 'B', customer _group 'engineer', merchant 'Sears', area 
'California ', 
30 time 'Year98') 

represent association rules at different area level (i.e. city level and state level) and time level 
(i.e. month level, year level) as 

[xeCustomers: buy _product(x, 'A') =^buy j)roduct(x, 'B') ] \ customer _group = 'engineer', 

merchant = 'Sears ', area = 'California ', time = 'Jan98 ' 
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and 

[ X eCustomers: btcy _product(x, 'A') => buy _product(x, 'B') ] \ customer _group = 'engineer ', 
merchant = 'Sears ', area = 'California ', time = 'Year98 ' . 
The cell 

5 CrossSales (product 'A ', product2 'B ', customer _group 'top ', merchant 'top ', area 'top ', time 

'top') 

represents the customer-based cross-sale association rule for all customers, merchants, 
areas, and time in the given ranges of these dimensions, expressed as: 
[ X eCustomers: buy _product(x, 'A') buy j)roduct(x, 'B') ] 

10 

4.3 Generating Association Rule Related Cubes 

Cube based association rules are represented by cubes and generated in terms of cube 
manipulations. In most of the intermediate steps, these operations are performed on sub-cubes. 
In fact, manipulating sub-cubes is basic to cube based association rule mining. 

15 Converting one cube, C, into another, C \ means populating the cells of C by the values 

calculated from certain cell values of C. Very often, such populating operations are performed 
on sub-cubes. Given a cube CID] with underlying dimensions Di, D„, which can be 
hierarchical,, a sub-cube (dice) of C is formed by limiting the values of one or more 
dimensions. The typical way to relate a pair of sub-cubes of C and C is to select a sub-cube in 

20 one, say, C, and map the dimension values limited to that sub-cube to the (same or related) 
dunension values underlying a corresponding sub-cube of C. 

The basic task of the current OLAP based association rule mining, either at the GDOS or 
at a LDOS, is to convert a volume cube, i.e. the cube representing the purchase volumes of 
customers dimensioned by product, area, etc, into an association cube, a base cube and a 

25 population cube. These cubes are then used to derive the confidence cube and the support cube 
of multidimensional association rule instances. 

The association rule with antecedent and consequent coming from the same dimension is 
referred to as cross association rule. Cross-sale association rules are one kind of cross 
association rules. A general algorithm for generating cross-association rules can developed, 

30 with the following example illustrating the derivation of multilevel, multidimensional and 
scoped cross-sale association rules from the volume cube SaleUnits. Related cubes are as 
defined before. The following should be noted: 
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Cross-sale association rules at multiple product levels are generated. That 
is, at item level (e.g. how the sale of a particular CD is associated with the sale of 
another); at category level (e.g. how the sale of beer is associated with the sale of 
diapers, regardless of the brands of these products); and at kind level. For this purpose 
5 the ProductLevel dimension is introduced. However, for semantic clarity, rules crossing 

product levels are not provided. 

In the volume-cube SaleUnits, only the bottom-level values of customer 
dimension (even if it is a hierarchical dimension) are currently presented. In the 
association-cube CrossSales, the value of customer jgroup dimension, e.g. "engineer", is 
10 mapped to a set of individual customers who are engineers, in order to represent 

individual based association in the scope of engineers. 

Mirror dimension product! is introduced to represent association rules 
across products. 

15 The steps of cross-sale association rule mining are listed below: 

• Rollup the volume cube SaleUnits by aggregating it along merchant, time, area dimensions. 

• Derive cube NumOfBuyers, NumOfShoppers 

for each c in customer _group { 

■ limit customer values to those corresponding to c at bottom level of customer dimension, for 
20 underlying subcubes of SaleUnits, NumOfBuyers and NumOfShoppers. 

■ populate the resulted subcube of NumOfBuyers that is dimensioned by product, merchant, area, 
time, based on antecedent condition SaleUnits > 0, i.e. each cell is assigned the number of 
individual customers corresponding to the underlying dimensions, that satisfy the antecedent 
condition. 

25 ■ populate the resulted subcube of NumOfShoppers by the counts of customers dimensioned by 

merchant, area, time (not by product) that satisfy the antecedent conditions. 

} 

• Derive cube CrossSales 

for each product level (i.e. item level, category level and kind level) { 
30 ■ limit Product to the values at this level, and populate product2 with the current range of product. 

■ for each c in customer_group { 

limit customer values to those corresponding to c at the bottom level of customer dimension, 
for each pi in product { 

for each p2 in prodiict2 { 
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dimensioned by merchant, time, area, assign CrossSales by the total number of 
customers that satisfy the association conditions 

SaieVmisiproduct pi) > 0 and ^n\eVn\ts{product2 P2) > 0 

} 

5 } 
} 

} 

• Derive cube Confidence and cube Support (cell-wise operation) 
■ Confidence = CrossSales / NumOfBuyers 
10 ■ Support = CrossSales / NumOfShoppers 

(Confidence, Support, CrossSales is dimensioned by product, product!, customer _group, 
merchant, time, area 

NumOfBuyers is dimensioned by product, customer _group, merchant, time, area 
NumOfShoppers is dimensioned by customer _group, merchant, time, area) 

15 

Rules with confidence and support that meet predetermined thresholds can be utilized for 
business plaiming purposes. These predetermined thresholds can vary from application to 
application and can depend on many factors, such as the particular item or product, the 
geographical area, and type of market. 

20 

4.4 Mining Association Rules with Conjoint Items. 

The present invention utilizes cubes with conjoint dimensions to represent refined 
multidimensional association rules. For example, the association rule with conjoint items 
generation module 570 can derive association rules across time. Time-variant, or temporal 
25 association rules such as 

[ xeCustomers: buy _product(x, 'A', 'Jan98') =^buy _product(x, 'B', 'Feb98') ] \ area = 'Los 

Angeles ' 

can be used to answer such questions as ''How are the sale ofB in Feb98 associated with the sale of A in 

Jan98?" The items in this rule are value pairs of dimensions product and time. 
30 In order to specify this kind of association rule, a conjoint dimension <product, time> is 

introduced, and mirrored to dimension <product2, time2>. This allows a cell in the association 
cube to cross two time values. Accordingly the association cube, base cube and population cube 
are defined as: 
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Association cube: CrossSales.2 (<product, time>, <product2, time2>, customer jgroup, merchant, 
area) 

Population cube: NumOfBnyers.2 (<product, time>, customer _group, merchant, area) 
Base cube: NumOfShoppers.2 ( customer _group, merchant, area). 
5 The confidence cube and the support cube are defined as follows: 

Confidence cube: Confidence.2 (<product, time>, <product2, time2>, customer _group, merchant, 
area) 

Support cube: Support.2 (<product, time>, <prodnct2, time2>, customer _group, merchant, area). 

10 The steps for generating these cubes are similar to the ones described before. The differences 

are that a cell is dimensioned by, besides others, <product, time> and <product2, time2>, and 
the template of the association condition is: 

SaleUmts(<product pi, time ti>) > 0 and S^lei\Jrut&{<product2 p2,time2 12>) > 0, 
where in any instance of this condition, the time expressed by the value of time 2, is not 
15 contained in the time expressed by the value of time. The template of the antecedent condition 
is: SaleVmts(<product Pj, time ti>)>0. 

As can be appreciated by those skilled in the art, other dimensions such as area may be 
added to the conjoint dimensions to specify more refined rules. 

20 4.5 Mining Functional Association Rules. 

The functional association rule generation module 580 can generate multidimensional 
function association rules. A multidimensional association rule is functional if its predicates 
include variables, and the variables in the consequent are fiinctions of those in the antecedent. 
For example, functional association rules can be used to answer the following questions, where 

25 a_month and ay ear are variables. 

□ What is the percentage of people in California who buy a printer in the next month 
after they bought a PC? i.e. 

[ xeCustomer: buy j)roduct(x, 'PC, a_ month) =^ buy _product(x, 'printer', a_month+]) ] \ 
area = 'California '. 

30 

□ What is the percentage of people who buy a printer within the year when they bought 
a PC? i.e. 

[xeCustomer: buy j)roduct(x, 'PC, a_year) =>buy _product(x, 'printer', a_year) ] \ area = 
'California '. 
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To distinguish, the association rules that are not functional are called instance association rules, 
such as: 

[ xeCustomer: biiyjproduct(x,' PC, 'Jan98')=>huyjroduct(x,' printer', 'Feb98') ] \ area = 
5 'California '. 

Time variant, functional association rules can be derived from time variant, instance 
association rules through cube restructuring. Let us introduce a new dimension timejdelta that 
has values one day, two_day, . . ., at day level, and values one_month, two_month, . . ., at month 
level, etc. Then let us consider the following functional association rule related cubes. 
10 Association cube: CrossSales.3 (product, product2, customer _group, merchant, area, 

time_delta) 

Population cube: NumOfBuyers.3 (product, customer _group, merchant, area) 
Base cube: NumOfShoppers.3 ( customer _£roup, merchant, area) 
Confidence cube: Confidence. 3 (product, product!, customer _group, merchant, area, 
15 timejdeha) 

Support cube: Support. 3 (product, product!, customer _group, merchant, area, time_deha) 
The association cube CrossSales.3 can be constructed from CrossSaIes.2 by binning illustrated 
below: 

for each < pi, tj> in <product, time> { 
20 for each <p2, t2> in <product2, time2> { 

dimensioned by merchant, area, add value 

CrossSales.2(<pro<iwcr pi, time tj >, <product2 p2 , time! t2 >) 
to the value of 

CrossSales.3(product pi, product! p2, time delta t2- ti) 

25 } 
} 

The cell values of CrossSales.2 in the selected time and time2 ranges are added to the 
corresponding cells of CrossSales.3. For example, the count value in cell: 

CrossSales.2(<PC, Jan98>, <printer, Feb98>...) 
30 is added to cell (bin): CrossSales.3(PC, printer, one_month,.. .) 

It can also be added to cell: CrossSales.3(PC, printer, one_year,. . .). 



5 . Distributed and Incremental Rule Mining. 
There exist two ways to deal with association rules: 
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• Static, that is, to extract a group of rules from a snapshot, or a history, of data and use "as 
is". 

• Dynamic, that is, to evolve rules from time to time using newly available data. 

Since association rules are mined from an e-commerce data warehouse holding transaction data, 
5 the data flows in continuously and are processed daily. 

As can be appreciated by those skilled in the art, mining association rules dynamically 
has the following benefits: 

• "Real-time" data mining, that is, the rules are drawn from the latest transactions for 
reflecting the current commercial trends. 

10 • Multilevel knowledge abstraction, which requires summarizing multiple partial results. 

For example, association rules on the month or year basis cannot be concluded from 
daily mining results. In fact, multilevel mining is incremental by nature. 

• Scalability, as there is no computing resource allowing us to mine a data set with 
arbitrary size, incremental and distributed mining have become a practical choice. 

15 Incremental association rule mining can be achieved by combining partial results. As 

can be understood by those skilled in the art, the confidence and support of multiple rules may 
not be combined directly. This is why they are treated as "views" and only maintain the 
association cube, the population cube and the base cube that can be updated from each new copy 
of volume cube. Several cases are described below to show how a GDOS can mine association 

20 rules by incorporating the partial results computed at LDOSs. 

• The first case is to sum up volume-cubes generated at multiple LDOSs. Let Cv.i be the 
volume-cube generated at LDOSi. The volume-cube generated at the GDOS by combining 

the volume-cubes fed from these LDOSs is Q = ^C^ ^ . The association rules are then 

25 generated at the GDOS from the centralized Cy. 

■ The second case is to mining local rules with distinct bases at participating LDOSs, resulting 
in a local association cube Caj a local population cube Cpj and a local base cube Cbj at each 
LDOS. At the GDOS, multiple association cubes, population cubes and base cubes sent 
30 from the LDOSs are simply combined, resulting in a summarized association cube and a 

summarized population cube, as C„ = ^C^ , , = X^/'.' ^^'^ • 
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corresponding confidence cube and support cube can then be derived as described earlier. 
Cross-sale association rules generated from distinct customers belong to this case. 

As can be understood by those skilled in the art, it is in general inappropriate to directly 
combine association cubes which cover areas ai, a^, to cover a larger area a. In the given 
5 example, this is so because association cubes record counts of customers that satisfy the 

association condition, and the sets of customers contained in a;, a^is not mutually disjoint. 
This can be seen in the following examples. 

■ A customer bought A and B in both San Jose and San Francisco which are covered by different 
LDOSs , contributes a count to the rule covering each city, but has only one count, not two, for the 

10 rule A=> B covering California. 

■ A customer (e.g. Doe in the figure below) who bought a TV in San Jose but a VCR in San Francisco, 
is not countable for the cross-sale association rule rF=> VCR covering any of these cities, but 
countable for the rule covering California. This is illustrated in the FIG. 7. 

15 6. In order to scale-up association rule mining in e-commerce, the present invention 

provides a distributed and cooperative data-warehouse/OLAP system. This system can generate 
association rules with enhanced expressive power, by combining information of discrete 
commercial activities from different geographic areas, different merchants and over different 
time periods. As described above, the present invention generates scoped association rules, 

20 association rules with conjoint items', and functional association rules which are new and useful 
extensions to prior art association rules. 

Once generated by the present invention, association rules may be used to identify 
opportunities for cross-selling products, analyze trends, or identify causes of sudden sales 
increases or decreases. Association rules are important in appUcations such as personalized 

25 promotions or inventory management. The present invention defines customer profiles and 
various new classes of multi-dimensional and multi-level association rales (e.g., scoped 
multidimensional rules, rules with conjoint items, and ftmctional rules) that are useflal for e- 
commerce applications. Summary information, customer profiles, and the different classes of 
association rules are computed in a distributed, cooperative maimer using OLAP tools. 

30 Summaries, profiles, and rules are incrementally updated as new transaction data is collected. 

The foregoing description has provided an example of the present invention that is 
directed to sales transaction data. It will be appreciated that various modifications and changes 
may be made thereto without departing from the broader scope of the invention as set forth in 



Attorney Docket No. 10991147-1 



the appended claims. For example, the method of generating, updating, and comparing the 
customer profiles of the present invention can be applied to other areas, such as targeted 
marketing and targeted promotions. In applications where there is a very large collection of 
transaction data, the present invention can be utilized to generate customer behavior profiles, 
extract patterns of the activities of the customer, generate association rules, and provide 
guidelines as to how to meet or otherwise service the needs of the customers. 
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CLAIMS 

What is claimed is: 

1 . A method for generating association rules comprising: 

a) receiving a volume cube that represents the purchase volume of customers; 

b) generating an association cube, a population cube and a base cube based on the 
volume cube; and 

c) deriving a confidence cube and a support cube of an association rule based on the 
association cube, population cube, and the base cube. 

2. The method of claim 1 wherein the step of generating an association cube, a population 
cube and a base cube based on the volume cube includes the step of 

generating an association cube that has at least two levels and at least two dimensions. 

3 . The method of claim 1 wherein the step of generating an association cube, a population 
cube and a base cube based on the volume cube includes the step of 

generating a scoped association rule cube; 

wherein the step of deriving a confidence cube and a support cube of an association rule 
based on the association cube, population cube, and the base cube includes the step of 
deriving a confidence cube and a support cube of a scoped association rule based on 
the association cube, population cube, and the base cube. 

4. The method of claim 1 wherein the step of generating an association cube, a population 
cube and a base cube based on the volume cube includes the step of 

generating an association rule with conjoint items cube; 

wherein the step of deriving a confidence cube and a support cube of an association rule 
based on the association cube, population cube, and the base cube includes the step of 
deriving a confidence cube and a support cube of an association rule with conjoint 
items based on the association cube, population cube, and the base cube. 

5. The method of claim 1 wherein the step of generating an association cube, a population 
cube and a base cube based on the volume cube includes the step of 

generating a functional association rule cube; 
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wherein the step of deriving a confidence cube and a support cube of an association rule 
based on the association cube, population cube, and the base cube includes the step of 
deriving a confidence cube and a support cube of a ftinctional association rule based on 
the association cube, population cube, and the base cube. 

5 

6. The method of claim 1 wherein steps (a), (b), and (c) are implemented by utilizing a 
OLAP programming. 

7. The method of claim 1 wherein step (a) includes the steps of 

10 al) receiving a first volume cube that represents the purchase volume of customers for a 

first region; 

a2) receiving a second volume cube that represents the purchase volume of customers for 
a second region; and 

wherein step (b) includes the step of 
15 bl) generating an association cube, a population cube and a base cube based on the first 

voliime cube and the second volume cube. 

8. A data processing system comprising: 

a) a plurality of local stations ("LDOSs") having a local computation engine for mining 
20 and summarizing the local transaction data and for generating local customer profile cubes; and 

b) at least one global station ("GDOS"), coupled to the plurality of the local stations, the 
global station having a global computation engine for receiving the local customer profiles, 
merging and mining the local profile cubes, and generating global profile cubes and association 
rules based on said local profile cubes, and providing the global profile cubes and the 

25 association rules to said plurality of LDOSs. 

9. The system according to claim 8, wherein each of said plurality of LDOSs comprises a 
local data warehouse and at least one local OLAP server, 

the local data warehouse being adapted to receive and store said transaction data, 
30 wherein the local computation engine builds the local profile cubes that contains at least 

partial information regarding customer profiling by periodically mining new transactions 
flowing into said local data warehouse and deriving patterns for local analysis, said local 
computation engine also being adapted to incrementally update said local profile cubes. 
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10. The system according to claim 9 wherein said local data warehouse receives and stores 
transaction data in a first predetermined interval and wherein said local OLAP engine generates 
said local profile cubes in a second predetermined interval. 

5 

1 1 . The system according to claim 9 wherein said GDOS comprises a global data warehouse 
and at least one global OLAP server, 

the global data warehouse for receiving and storing the local profile cubes, 
the global computation engine for combining summary information from each of said 
10 LDOSs to build and incrementally update said global profile cubes and association rules, and for 
providing feedback to said plurality of LDOSs. 

12. The system according to claim 11, wherein said local and global profile cubes comprise 
information of a plurality of customers, said information being derived from transaction data 

15 with said customers as stored by said local and global data warehouses, said profiling 

information specifying at least the following: kind, product, customer, merchant, time and area. 

13. The system according to claim 1 2, wherein: 

said local profile cubes are maintained at LDOS and said global profile cubes are 
20 maintained at GDOS, each of said local profile cubes being populated by mapping values in 
transaction data records into each dimension of said profile cube, each of said global profile 
cubes being retrieved and updated by merging appropriate local profile cubes. 

14. The system according to claim 12, wherein said profile cubes are used to derive a 
25 plurality of shopping pattern cubes, said shopping pattern cubes comprising: 

shopping behavior of at least one customer; 
shopping patterns based on probability distribution; 
shopping patterns based on volume. 



30 



15. The system according to claim 8, wherein said association rules comprise: 

scoped association rule v^th different bases, each of the bases being said scoped 
association rule's population over which said scoped association rule is defined; 
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multidimensional association rule with "customer" being its base, "products" being its 
item, and "merchant," "area" and "time" being underlying features of said multidimensional 
association rule; and 

multilevel association rule with its features being represented at multiple levels. 

16. The system according to claim 15, wherein said association rules are mined by: 
converting a volume cube into an association cube, a base cube and a population cube, 

said volume cube representing purchase volumes of customers dimensioned by item, base and 
feature; 

deriving a support cube based on said base cube and said association cube; and 
deriving a confidence cube based on said association cube and said population cube. 

17. A method of distributed data processing using on-line analytical processing ("OLAP") 
engines for use with transaction data in electronic commerce, comprising the steps of: 

mining and summarizing, using a plurality of local servers ("LDOSs"), said transaction 
data to generate local profile cubes; 

merging and mining, using at least one global server ("GDOS"), said local profile cubes 
received from said plurality of LDOSs to generate global profile cubes and association rules 
based on said local profile cubes; and 

feeding back said global profile cubes and association rales from said GDOS to said 
plurality of LDOSs for their business applications. 

18. The method according to claim 1 7, wherein the step of mining and summarizing, using 
LDOSs, comprises: 

receiving and storing said transaction data using a local data warehouse, 

building, using a local OLAP engine, said local profile cubes containing at least partial 

information regarding customer profiling by periodically mining new transactions flowing into 

said local data warehouse and deriving patterns for local analysis; and 

incrementally updating said local profile cubes with the new transactions. 

19. The method according to claim 1 8 wherein the step of receiving and storing is in a first 
predetermined interval and wherein the step of building said local profiles is in a second 
predetermined interval. 
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20. The method according to claim 18 wherein said step of receiving, merging and mining 
by using the GDOS comprises: 

storing said local profile cubes using a global data warehouse; 
5 combining using a global OLAP engine summary information from each of said LDOSs 

to build and incrementally update said global profile cubes and association rules; and 
feeding back said global profile cube and association rules. 

2 1 . The method according to claim 20 wherein said local and global profile cubes include 
10 information of a plurality of customers, said information being derived from transaction data 

with said customers as stored by said local and global data warehouses, said profiling 
information specifying at least the following: kind, product, customer, merchant, time and area. 

22. The method according to claim 21 wherein said local profile cubes are maintained at 

1 5 LDOS and said global profile cubes are maintained at GDOS, each of said local profile cubes 
being populated by mapping values in transaction data records into each dimension of said 
profile cube, each of said global profile cubes being retrieved and updated by merging 
appropriate local profile cubes. 

- 20 23. The method according to claim 21 wherein said profile cubes are used to derive a 
plurality of shopping pattern cubes, said shopping pattern cubes comprising: 
shopping behavior of at least one customer; 
shopping patterns based on probability distribution; 
shopping patterns based on volume. 

25 

24. The method according to claim 17 wherein said association rules comprise: 

scoped associate nrle with different bases, each of the bases being said scoped 

association rule's population over which said scoped association rule is defined; 

multidimensional association rule with "customer" being its base, "products" being its 
30 item, and "merchant," "area" and "time" being underlying features of said multidimensional 

association rule; and 

multilevel association rule with its features being represented at multiple levels. 



Attorney Docket No. 10991147-1 

-36- 

25. The method according to claim 24, wherein said association rules are mined by: 

converting a volume cube into an association cube, a base cube and a population cube, 
said volume cube representing purchase volumes of customers dimensioned by item, base and 
feature; 

deriving a support cube based on said base cube and said association cube; and 
deriving a confidence cube based on said association cube and said population cube. 
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ABSTRACT OF THE DISCLOSUBUE 

A distributed OLAP -based method and system for generating association rules. An 
architecture is provided for processing transaction data to generate summary information, 
customer profiles, and association rules. The distributed system includes at least two layers of 
data warehouse/OLAP stations: local data-warehouse OLAP stations (LDOSs) and a global 
data-warehouse OLAP station (GDOS). The LDOSs perform local data mining and 
summarization, and the GDOS merges, mines, and summarizes the input data received from 
LDOSs. The summarized data is then utilized by the GDOS to generate association rules that 
can be provided to the LDOSs for business plarming. 
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